20 Gaussian Process

Recall bivariate normal distribution: (Y1,Y2)N2(μ,Σ), where μ=(μ1μ2),Σ=(σ12ρσ1σ2ρσ1σ2σ22).
We are curious about Y2|Y1=aN(μ2|1,Σ2|1). See Lecture 19, this implies μ2|1=μ2+ρσ2σ1(aμ1),Σ2|1=σ22(1ρ2)σ22.
How can we visualize Y1,,Yd for d>2?
Gaussian Process generalizes this concept to functions Y{xR,xSR}, S can contain infinitely many points.

Stochastic Process

A Stochastic Process is a collection of random variables {Y(x),xS}, where S is the index set, like time.

Gaussian Process

A Gaussian Process (GP) is a stochastic process s.t. any finite collection {Y(x1),,Y(xn)} is multivariate normal, i.e. Y()GP(m(),k(,)). m is called mean function, k is called covariance function/kernel. E[Y(x)]=m(x),Cov(Y(x),Y(x))=k(x,x).
Moreover, define m=[m(x1)m(xn)],K=[k(x1,x1)k(x1,xn)k(xn,x1)k(xn,xn)].

Example: Radial Basis Function (RBF)

KRBF(x,x)=σ2exp{12l2|xx|2}. l>0 is called length scale.

Sampling procedure:

  1. Discretize S as {x1,,xD}.
  2. Sample Y(x1)N(m(x1),k(x1,x1)).
  3. For n=1,,D1, (Y(x1),,Y(xn+1))Nn+1(m,K), where m=[m(x1)m(xn)m(xn+1)],K=[k(x1,x1)k(x1,xn)k(x1,xn+1)k(xn,x1)k(xn,xn)k(xn,xn+1)k(xn+1,x1)k(xn+1,xn)k(xn+1,xn+1)].
    Denote Cn+1=[k(x1,xn+1)  k(n,xn+1)]T.
    Sample Y(xn+1) from Y(xn+1)Y(x1),,Y(xn)N(μn+1,σn+12). So $$\begin{align*}
    \mu_{n+1}&=m(x_{n+1})+\vec{C}{n+1}^{\mathrm{T}}K{n}^{-1}\begin{bmatrix}
    Y(x_{1})-m(x_{1}) \ \vdots \ Y(x_{n})-m(x_{n})
    \end{bmatrix},\
    \sigma_{n+1}^{2}&= k(x_{n+1},x_{n+1})-\vec{C}{n+1}^{\mathrm{T}}K{n}^{-1}\vec{C}_{n+1}.
    \end{align*}

Common kernels: